In part 1, we established that:
From the overview of Part 1, we can now pursue this microbial mechanism, and identify candidate microbial characteristics to use as indicators for “good” communities - i.e. putative causal agents. The workflow is:
If microbes are a plausible mechanism for suppressed corn and soybean yield in the 2-year rotation, we will move to interpreting the putative causal agents using FUNGuild and picrust2.
Unlike Part 1, this analysis will be completed on a crop-by-crop basis.
We’ll process both 16S and ITS data as compositional data. The characteristics of this are:
As a result, the analysis needs to preserve a few properties to be robust (see Greenacre and Lewi (2009)):
Weighted log-ratio analysis is the ordination approach to handling such data. Because it requires logarithms, it requires some way to handle zeros in the dataset, ideally that also preserves relationships among observed data. Martin-Fernandez et al. (2015) develop an imputation approach that accomplishes this goal, and implement it in the package, zCompositions.
We still want to remove extremely rare taxa - those that are so rare their varaince can’t be estimated - and samples with so few taxa we can’t estimate variance on sample relationships to other samples. We will retain samples with at least 3 taxa, and taxa appearing in at least 3% of samples - which works out to 5 for this 160 sample dataset.
We also want to account for sampling depth, so samples will be weighted based on log10 total reads.
Packages and Custom Scripts
# Custom functions
functions = c('Convenience Functions.R',
# 'Ordination Functions.R',
'Part 2 - Helper Functions.R')
source_functions = function() {
require(here)
for (i in functions) source(here('Scripts', i))
}
source_functions()
# Libraries
libs = c('magrittr',
'here',
'zCompositions', # for zero imputation
'easyCODA', # for CLR transformation
'phyloseq', # convenient object management
'lavaan', # SEM
'lavaanPlot', # SEM plots?
'semPlot', # ugly but effective plots
'ggplot2'
)
load_libs(libs) # load libraries, and install if necessary
#pcwOrd
load_pcwOrd = function() {
here('Scripts', 'pcwOrd_0.2.1.Rdata') %>%
load(envir=.GlobalEnv)
}
load_pcwOrd()
# pcwOrd permutations
nperms = 9999
# lavaan option
sem_error = 'bootstrap'
# sem_error = 'default'
Data Sources
in_plot = 'Master Plot Data.csv'
in_bac_counts = 'Master 16S Counts.csv'
in_bac_taxonomy = 'Master 16S Taxonomy.csv'
in_fun_counts = 'Master ITS Counts.csv'
in_fun_taxonomy = 'Master ITS Taxonomy.csv'
in_funguild = 'FUNGuild Taxonomy.guilds.csv' # direct output from FUNGuild converted to .csv in MSExcel
in_obj = c('Contrasts.Rdata',
'Pretty Names.Rdata')
Plot data, as before.
plot = here('Data', in_plot) %>%
read.csv(stringsAsFactors=FALSE)
plot[, 1:14] %<>% lapply(as.factor)
corner(plot)
bac_ps = load_to_phyloseq('BACTERIA_ID', # Part 2 helper functions. Includes zero replacement.
in_bac_counts,
in_bac_taxonomy,
plot,
min_tax=3,
min_occurrances=8,
min_reads=1)
## No. corrected values: 2418
# remove Bradyrhizobium
rw = attr(bac_ps, 'row_weights')
bac_ps %<>% subset_taxa(Genus != 'Bradyrhizobium')
otu_table(bac_ps) %<>%
close_matrix %>%
otu_table(taxa_are_rows=FALSE)
attr(bac_ps, 'row_weights') = rw
fun_ps = load_to_phyloseq('FUNGI_ID',
in_fun_counts,
in_fun_taxonomy,
plot,
min_tax=3,
min_occurrances=8,
min_reads=1) # per sample
## No. corrected values: 15156
phylo_obj = list(`16S` = bac_ps,
`ITS` = fun_ps)
print(phylo_obj)
## $`16S`
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 370 taxa and 157 samples ]
## sample_data() Sample Data: [ 157 samples by 53 sample variables ]
## tax_table() Taxonomy Table: [ 370 taxa by 6 taxonomic ranks ]
##
## $ITS
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 433 taxa and 158 samples ]
## sample_data() Sample Data: [ 158 samples by 53 sample variables ]
## tax_table() Taxonomy Table: [ 433 taxa by 7 taxonomic ranks ]
For comparison, the original ITS data (sent through ITSxpress) was:
$ITS
phyloseq-class experiment-level object
otu_table() OTU Table: [ 201 taxa and 148 samples ]
sample_data() Sample Data: [ 148 samples by 53 sample variables ]
tax_table() Taxonomy Table: [ 201 taxa by 7 taxonomic ranks ]
FUNGuild
funguild = here('Data', in_funguild) %>%
read.csv(stringsAsFactors=FALSE, row.names=1) %>%
.[rownames(tax_table(fun_ps)), ]
corner(funguild)
Contrasts and nice names for plotting
for (i in in_obj) load(here('Data', i), envir=.GlobalEnv)
lapply(contrasts, round, 3)
## $corn
## COWS CPWS CSCS CSSwP CSSwSf
## ROTATION_LENGTH -0.250 -0.250 1 -0.250 -0.25
## FOLLOW_SOY 0.500 0.500 0 -0.500 -0.50
## FOLLOW_SUNFLOWER -0.333 -0.333 0 -0.333 1.00
##
## $soy
## COWS CPWS CSCS CSSwP CSSwSf
## ROTATION_LENGTH -0.25 -0.25 1 -0.25 -0.25
## FOLLOW_CORN -0.50 -0.50 0 0.50 0.50
## WITH_PEA -0.50 0.50 0 0.50 -0.50
plot_out = matrix(nrow=0, ncol=6)
colnames(plot_out) = c('CROP', 'SAMPLING', 'CONTRAST', 'BARCODE', 'BIOMASS', 'SCORE')
stats_out = matrix(nrow=0, ncol=20)
colnames(stats_out) = c('CROP', 'SAMPLING', 'CONTRAST', 'BARCODE', 'ORD_F', 'ORD_DF', 'ORD_P', 'YEAR_F', 'YEAR_DF', 'YEAR_P', 'YEAR_COEF', 'YEAR_SE', 'CONTRAST_COEF', 'CONTRAST_SE', 'CONTRAST_T', 'CONTRAST_P', 'AXIS_COEF', 'AXIS_SE', 'AXIS_T', 'AXIS_P')
# initiate data.frames for storing results
full_ord_results = update_full_ord_results()
contrast_ord_results = update_contrast_ord_results()
taxa_results = update_taxa_results()
ord_ls = list() # for storing ordinations, including permutations.
sem_ls = list()
The workflow is, for each subset of crop, sampling time, and community:
Due to low sample sizes (N=8 for each treatment * crop * sampling time combination across years), marginal effects (0.1 > p > 0.05), and taxa that individually respond to contrasts at 40% false discovery rates will be further investigated. This is reasonable as the goal in selecting taxa is to reduce noise in the dataset. This is a low-effort way to separating taxa that seem to respond to rotation from the ones that don’t.
As with univariate tests, we’ll first test whether our response, the microbial community, responds to the overall treatments. If so, we will test the contrasts that were also predictive of plant health (biomass). The dataset will be reduced to those taxa responding to the contrast
The output for each crop and stage will be a graph of:
Plus a sem graph showing the relationships leading to biomass.
First we see if these communities respond to the treatment.
crop = 'corn'
sampling = 'seedling'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
# Part 2 - Helper Functions.
#' Subsets phylo object by crop and sampling.
#' Builds Y = otu_table(ps_obj), X = sample_data(ps_obj)[constraint], and Z = sample_data(ps_obj)[partial] matrices
#' optionally uses a contrast for X as defined in contrast
#' Performs partial constrained weighted log-ratio analysis with pcwOrd::pcwOrd:
#' - row weights are log10 total reads
#' - column weights are mean relative abundances
#' - otu_table is center-log-ratio-transformed by easyCODA::CLR
#' Produces diagnostic plots and returns the pcwOrd object.
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
# pcwOrd function
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
Here is the model PERMANOVA (pseudo-F) test, based on 9999 permutations.
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 corn seedling 16S 0.1085067 1.143856 5,34 0.0638
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 10.851 of corn 16S community variation at seedling (p = 0.064). Therefore, we will test contrasts with the 16S dataset.
barcode = 'ITS'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 corn seedling ITS 0.1584024 2.07696 5,34 1e-04
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 15.8402393 of corn ITS community variation at seedling (p = 10^{-4}). Therefore, we will test contrasts with the ITS dataset.
contrast = 'FOLLOW_SOY'
barcode = '16S'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 corn seedling 16S FOLLOW_SOY 0.02237139 0.9274939 2,37 0.7581
ord_ls[[nn]] = perms
16S communities did not respond to ROTATION at seedling (p = 0.76). We will not test it further or reduce features. We’ll just save the scores for plotting.
barcode = 'ITS'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
Our contrast has a negative score on the constrained axis, so we will invert it.
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 corn seedling ITS FOLLOW_SOY 0.03801038 1.829664 2,37 0.0286
ord_ls[[nn]] = perms
ITS communities did respond to ROTATION at seedling (p = 0.029).
Here are the axis scores so far. Note we will NOT include 16S in the model.
response = 'AV_DW'
find_ords = c(crop, sampling, contrast) %>%
sapply(grepl, names(ord_ls)) %>%
apply(1, all)
local_ords = ord_ls[find_ords] %>%
lapply(extract2, 'observed')
sem_name = paste(crop, sampling, contrast, sep='_')
#' combine axis (standard) scores plus response, contrast, and partial data into a single data.frame for modeling.
#' contrast_row is the row of contrasts[[crop]]. Must be a matrix.
#cc = contrasts[[crop]][contrast, , drop=FALSE]
mod_df = generate_prediction_df(local_ords,
crop,
sampling,
contrasts[[crop]][contrast, , drop=FALSE],
partial,
constraint,
response,
plot,
auto_invert=TRUE) # if contrast and axis are negatively correlated, for consistency.
head(mod_df)
The SEM models are:
no_mediation: A null model where AV_DW responds only to FOLLOW_SOYpartial_mediation: A full model where AV_DW responds to both FOLLOW_SOY and the communityfull_mediation: An intermediate model where AV_DW responds only to the communityIn all models, YEAR is included as a predictor of AV_DW, as well.
Errors will be estimated by bootstrapping 1000 times, missing values using maximum likelihood imputation, and the overall models by maximum likelihood.
mod_df %<>% na.omit
no_mediation = {
"
# Model
AV_DW ~ YEAR + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
AXIS_ITS ~~ 0*AV_DW
# Calculate pathways for easy interpretation
direct := c
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
direct := c
total := c + (a*b)
indirect := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS# + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
# direct := c
# total := c + (a*b)
indirect := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_ITS 1 150.44 163.96 0.0049
## no_mediation 2 163.60 175.42 15.1584 15.153 1 9.911e-05
## full_mediation_ITS 2 148.84 160.66 0.3981 -14.760 0
##
## partial_mediation_ITS
## no_mediation ***
## full_mediation_ITS
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation,
partial_mediation_ITS = partial_mediation_ITS,
full_mediation_ITS = full_mediation_ITS,
df = mod_df)
The best model is full_mediation_ITS.
mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 26 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 7
##
## Number of observations 40
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 0.398
## Degrees of freedom 2
## P-value (Chi-square) 0.820
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AV_DW ~
## YEAR -1.598 0.141 -11.356 0.000 -1.598 -0.811
## AXIS_ITS (b) -0.375 0.097 -3.858 0.000 -0.375 -0.376
## AXIS_ITS ~
## FOLLOW_SOY (a) 1.522 0.253 6.023 0.000 1.522 0.689
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 2.397 0.258 9.293 0.000 2.397 2.433
## .AXIS_ITS -0.000 0.115 -0.000 1.000 -0.000 -0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 0.195 0.054 3.586 0.000 0.195 0.201
## .AXIS_ITS 0.512 0.095 5.389 0.000 0.512 0.525
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## indirect -0.570 0.178 -3.205 0.001 -0.570 -0.259
This is an excellent model.
lavaanPlot(model=get(keep_mod), coefs=TRUE)
This is a marginal plot, where year effects are removed for clarity of the axis effects.
select_barcode = 'ITS'
#' plot response ~ barcode_axis based on data in mod_df created above.
#' add shapes for partial and colors for constraint
#' response is optionally the residuals of response ~ partial.
#' returns a ggplot object for easy post-hoc formatting
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_effect(select_barcode, response, partial, constraint, contrast, mod_df, partial_response=TRUE, nn)
The taxa that load most credibly onto this axis are:
select_barcode = 'ITS'
is_ITS = select_barcode=='ITS'
if (is_ITS) {
depth = 'Species'
guilds = funguild
} else {
depth = 'Genus'
guilds = NULL
}
ord_obj = paste(crop, sampling, select_barcode, contrast, sep='_') %>%
ord_ls[[.]]
#' Select the features in feature_F of ord_permutations$results that have an adjusted
#' p-value < max_p_adjust. Name these taxa based on tax_table(plylo_obj).
#' Include additional info like guilds and (contribution, sensu Greenacre 2013) scores.
top_tax = id_top_tax(ord_obj,
phylo_obj[[select_barcode]],
metric='feature_F',
phylo_depth=depth,
max_p_adj=1,
max_p_val=1,
guilds=funguild,
auto_invert=TRUE) %>%
subset(pcAxis1^2 > 0.01)
tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# subset(top_tax[tt_sort, ], pcAxis1^2 > 0.005)
# top_tax[tt_sort, ]
# tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# top_tax %<>% .[tt_sort, ] %T>% print# %>%
# subset(pcAxis1^2 > 0.005) %T>%
# print
# subset(top_tax[tt_sort, ], p_val < 0.05)
taxa_results %<>% update_taxa_results(crop, sampling, select_barcode, contrast, top_tax[tt_sort, ])
## CROP SAMPLING BARCODE CONTRAST ASV PHYLUM
## 1 corn seedling ITS FOLLOW_SOY ASV_9 Ascomycota
## 2 corn seedling ITS FOLLOW_SOY ASV_15 Ascomycota
## 3 corn seedling ITS FOLLOW_SOY ASV_17 Ascomycota
## 4 corn seedling ITS FOLLOW_SOY ASV_8 Ascomycota
## 5 corn seedling ITS FOLLOW_SOY ASV_183 Ascomycota
## 6 corn seedling ITS FOLLOW_SOY ASV_13 Ascomycota
## 7 corn seedling ITS FOLLOW_SOY ASV_54 Ascomycota
## 8 corn seedling ITS FOLLOW_SOY ASV_3 Ascomycota
## 9 corn seedling ITS FOLLOW_SOY ASV_43 Ascomycota
## 10 corn seedling ITS FOLLOW_SOY ASV_71 Ascomycota
## 11 corn seedling ITS FOLLOW_SOY ASV_130 Olpidiomycota
## 12 corn seedling ITS FOLLOW_SOY ASV_62 Olpidiomycota
## 13 corn seedling ITS FOLLOW_SOY ASV_25 Ascomycota
## 14 corn seedling ITS FOLLOW_SOY ASV_69 Chytridiomycota
## 15 corn seedling ITS FOLLOW_SOY ASV_42 Olpidiomycota
## TAXA
## 1 Unknown Trichoderma
## 2 Unknown Drechslera
## 3 Gibberella acuminata
## 4 Unknown Trichoderma
## 5 Unknown Diaporthe
## 6 Clonostachys rosea
## 7 Unknown Microdochium
## 8 Unknown Helotiales
## 9 Unknown Chaetothyriales
## 10 Unknown Paraphaeosphaeria
## 11 Olpidium brassicae
## 12 Olpidium brassicae
## 13 Dactylonectria macrodidyma
## 14 Unknown Chytridiaceae
## 15 Olpidium brassicae
## GUILD AXIS
## 1 Endophyte-Epiphyt-Fungal Parasite-Plant Pathogen-Wood Saprotroph 0.3821456
## 2 Plant Pathogen 0.3485287
## 3 Plant Pathogen 0.2347919
## 4 Endophyte-Epiphyt-Fungal Parasite-Plant Pathogen-Wood Saprotroph 0.1558201
## 5 Endophyte-Plant Pathogen 0.1442146
## 6 Plant Pathogen 0.1182411
## 7 Endophyte-Plant Pathogen 0.1051201
## 8 - -0.1045408
## 9 - -0.1226169
## 10 Undefined Saprotroph -0.1324429
## 11 Plant Pathogen -0.1337711
## 12 Plant Pathogen -0.1404569
## 13 - -0.1575138
## 14 - -0.2395906
## 15 Plant Pathogen -0.4016240
## RSQ F_OBS P_VAL P_ADJ
## 1 0.066253903 4.1951582 0.0510 0.5295923
## 2 0.036786539 1.4130944 0.2418 0.7321636
## 3 0.085186102 4.5277325 0.0450 0.5295923
## 4 0.012115576 1.7593752 0.1985 0.6918306
## 5 0.288689886 17.6989867 0.0002 0.0173200
## 6 0.070450526 3.1288029 0.0843 0.5514736
## 7 0.059048355 2.4321934 0.1210 0.6389390
## 8 0.008158889 0.3096316 0.5929 0.8662785
## 9 0.126679408 5.3997232 0.0230 0.3688519
## 10 0.071240975 2.9536943 0.0944 0.5523676
## 11 0.054676613 2.5409566 0.1171 0.6338037
## 12 0.018106654 1.6325839 0.1921 0.6918306
## 13 0.229494281 11.0374698 0.0026 0.1125800
## 14 0.119196577 7.9016380 0.0096 0.2256158
## 15 0.082827949 6.7648635 0.0143 0.2948524
We can see a number of pathogens associated with soybean as a previous crop. These all “contribute” 1% to the solution.
contrast = 'ROTATION_LENGTH'
barcode = '16S'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 corn seedling 16S ROTATION_LENGTH 0.0316659 1.326651 2,37 0.0431
ord_ls[[nn]] = perms
ROTATION explains 3.17 of corn 16S community variation at seedling. This is an intermediate (p = 0.043) effect with a clear visual outcome in both this ordination in the response ordination. We will investigate taxa.
barcode = 'ITS'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 corn seedling ITS ROTATION_LENGTH 0.06618771 3.307242 2,37 1e-04
ord_ls[[nn]] = perms
ROTATION explains 6.62 of corn ITS community variation at seedling. This is a strong (p = 10^{-4}) effect. We will investigate taxa.
Here are the axis scores combined with other model parameters.
find_ords = c(crop, sampling, contrast) %>%
sapply(grepl, names(ord_ls)) %>%
apply(1, all)
local_ords = ord_ls[find_ords] %>%
lapply(extract2, 'observed')
sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords,
crop,
sampling,
contrasts[[crop]][contrast,,drop=FALSE],
partial,
constraint,
response,
plot,
auto_invert=TRUE) # if contrast and axis are negatively correlated, for consistency.
head(mod_df)
The SEM models are:
no_mediation: A null model where AV_DW responds only to ROTATION_LENGTHpartial_mediation: A full model where AV_DW responds to both ROTATION_LENGTH and the communityfull_mediation: An intermediate model where AV_DW responds only to the communityIn all models, YEAR is included as a predictor of AV_DW, as well.
mod_df %<>% na.omit
no_mediation = {
"
# Model
AV_DW ~ YEAR + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ 0*AV_DW
AXIS_16S ~~ 0*AV_DW
AXIS_ITS ~~ AXIS_16S
# Calculate pathways for easy interpretation
direct := c
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_full = {
"
AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ AXIS_16S
direct := c
total := c + (a*b) + (d*e)
indirect_its := a*b
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_16S = {
"
AV_DW ~ YEAR + e*AXIS_16S + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ 0*AV_DW
AXIS_ITS ~~ AXIS_16S
direct := c
total := c + (d*e)
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_16S ~~ 0*AV_DW
AXIS_ITS ~~ AXIS_16S
direct := c
total := c + (a*b)
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_full = {
"
AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ AXIS_16S
indirect_its := a*b
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_16S = {
"
AV_DW ~ YEAR + e*AXIS_16S
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ AXIS_16S
AXIS_ITS ~~ 0*AV_DW
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ AXIS_16S
AXIS_16S ~~ 0*AV_DW
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_mediation, partial_mediation_full, partial_mediation_16S, partial_mediation_ITS, full_mediation_full, full_mediation_16S, full_mediation_ITS) %T>%
print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_full 2 221.39 243.34 0.0008
## partial_mediation_16S 3 220.74 241.00 1.3476 1.34675 1 0.2458
## partial_mediation_ITS 3 219.68 239.94 0.2872 -1.06038 0
## full_mediation_full 3 219.51 239.78 0.1244 -0.16282 0
## no_mediation 4 219.01 237.59 1.6238 1.49941 1 0.2208
## full_mediation_16S 4 219.82 238.40 2.4339 0.81010 0
## full_mediation_ITS 4 218.98 237.55 1.5894 -0.84452 0
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation,
partial_mediation_ITS = partial_mediation_ITS,
partial_mediation_16S = partial_mediation_16S,
full_mediation_ITS = full_mediation_ITS,
full_mediation_16S = full_mediation_16S,
full_mediation_full = full_mediation_full,
df = mod_df)
The best model is full_mediation_ITS. However, this is only by 0.03 AIC points - within rounding error. We will select the no_mediation model as it is the “null”.
mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 26 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 11
##
## Number of observations 40
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 1.589
## Degrees of freedom 4
## P-value (Chi-square) 0.811
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AV_DW ~
## YEAR -1.602 0.176 -9.115 0.000 -1.602 -0.812
## AXIS_ITS (b) -0.252 0.063 -3.995 0.000 -0.252 -0.252
## AXIS_ITS ~
## ROTATION_L (a) 1.484 0.130 11.451 0.000 1.484 0.751
## AXIS_16S ~
## ROTATION_L (d) 1.714 0.201 8.505 0.000 1.714 0.868
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AXIS_ITS ~~
## .AXIS_16S -0.000 0.057 -0.003 0.997 -0.000 -0.001
## .AV_DW ~~
## .AXIS_16S 0.000 0.000 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 2.404 0.338 7.117 0.000 2.404 2.436
## .AXIS_ITS -0.000 0.107 -0.000 1.000 -0.000 -0.000
## .AXIS_16S -0.000 0.078 -0.000 1.000 -0.000 -0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 0.270 0.106 2.537 0.011 0.270 0.277
## .AXIS_ITS 0.425 0.093 4.561 0.000 0.425 0.435
## .AXIS_16S 0.241 0.047 5.087 0.000 0.241 0.247
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## indirect_its -0.374 0.091 -4.117 0.000 -0.374 -0.189
This is an excellent model.
lavaanPlot(model=get(keep_mod), coefs=TRUE)
Neither barcode community credibly explains seedling biomass better than ROTATION_LENGTH alone.
Combining both contrast axis scores into a data.frame to build a corn seedling biomass model.
contrast = c('FOLLOW_SOY', 'ROTATION_LENGTH')
barcode = c('ITS', 'ITS')
sem_name = sapply(seq_len(length(barcode)), function(x) paste(crop, sampling, barcode[x], contrast[x], sep='_'))
local_ords = ord_ls[sem_name] %>%
lapply(extract2, 'observed')
mod_df = lapply(contrast, function(x) {
out = generate_prediction_df(local_ords,
crop,
sampling,
contrasts[[crop]][x,,drop=FALSE],
partial,
constraint,
response,
plot,
auto_invert=TRUE) # if contrast and axis are negatively correlated, for consistency.
is_axis = grepl('AXIS', names(out))
names(out)[is_axis] %<>% paste(x, sep='_')
return(out)
}) %>%
set_names(contrast)
tt = mod_df[[1]]
if (length(mod_df) > 1) {
for (i in 2:length(mod_df)) {
merge_by = 'SAMPLE_ID'
mm = mod_df[[i]]
is_axis = grepl('AXIS', names(mm)) %>%
names(mm)[.]
tt = merge(tt, mm[, c(merge_by, is_axis, contrast[i])], by=merge_by, all=TRUE)
}
}
mod_df = tt
Here are the combined SEM results
mod_df %<>% na.omit
axis_name = paste0('AXIS_', barcode)
no_mediation = {
"
# Model
AV_DW ~ YEAR + a*CONTRAST1 + b*CONTRAST2
AXIS1_CONTRAST1 ~ c*CONTRAST1
AXIS2_CONTRAST2 ~ d*CONTRAST2
# covariances
AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
# Calculate pathways for easy interpretation
direct := a + b
"
} %>%
gsub('CONTRAST1', contrast[1], .) %>%
gsub('CONTRAST2', contrast[2], .) %>%
gsub('AXIS1', axis_name[1], .) %>%
gsub('AXIS2', axis_name[2], .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_full = {
"
# Model
AV_DW ~ YEAR + a*CONTRAST1 + b*CONTRAST2 + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
AXIS1_CONTRAST1 ~ c*CONTRAST1
AXIS2_CONTRAST2 ~ d*CONTRAST2
# covariances
AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
# Calculate pathways for easy interpretation
direct := a + b
indirect_CONTRAST1 := c*e
indirect_CONTRAST2 := d*f
total_CONTRAST1 := a + (c*e)
total_CONTRAST2 := b + (d*f)
"
} %>%
gsub('CONTRAST1', contrast[1], .) %>%
gsub('CONTRAST2', contrast[2], .) %>%
gsub('AXIS1', axis_name[1], .) %>%
gsub('AXIS2', axis_name[2], .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_full = {
"
# Model
AV_DW ~ YEAR + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
AXIS1_CONTRAST1 ~ c*CONTRAST1
AXIS2_CONTRAST2 ~ d*CONTRAST2
# covariances
AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
# Calculate pathways for easy interpretation
#direct := a + b
indirect_CONTRAST1 := c*e
indirect_CONTRAST2 := d*f
#total_CONTRAST1 := a + (c*e)
#total_CONTRAST2 := b + (d*f)
"
} %>%
gsub('CONTRAST1', contrast[1], .) %>%
gsub('CONTRAST2', contrast[2], .) %>%
gsub('AXIS1', axis_name[1], .) %>%
gsub('AXIS2', axis_name[2], .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_follow_full_length = {
"
# Model
AV_DW ~ YEAR + a*CONTRAST1 + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
AXIS1_CONTRAST1 ~ c*CONTRAST1
AXIS2_CONTRAST2 ~ d*CONTRAST2
# covariances
AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
# Calculate pathways for easy interpretation
direct := a#
indirect_CONTRAST1 := c*e
indirect_CONTRAST2 := d*f
total_CONTRAST1 := a + (c*e)
"
} %>%
gsub('CONTRAST1', contrast[1], .) %>%
gsub('CONTRAST2', contrast[2], .) %>%
gsub('AXIS1', axis_name[1], .) %>%
gsub('AXIS2', axis_name[2], .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_follow_partial_length = {
"
# Model
AV_DW ~ YEAR + b*CONTRAST2 + e*AXIS1_CONTRAST1 + f*AXIS2_CONTRAST2
AXIS1_CONTRAST1 ~ c*CONTRAST1
AXIS2_CONTRAST2 ~ d*CONTRAST2
# covariances
AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
# Calculate pathways for easy interpretation
direct := b
indirect_CONTRAST1 := c*e
indirect_CONTRAST2 := d*f
total_CONTRAST2 := b + (d*f)
"
} %>%
gsub('CONTRAST1', contrast[1], .) %>%
gsub('CONTRAST2', contrast[2], .) %>%
gsub('AXIS1', axis_name[1], .) %>%
gsub('AXIS2', axis_name[2], .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_follow_none_length = {
"
# Model
AV_DW ~ YEAR + b*CONTRAST2 + e*AXIS1_CONTRAST1
AXIS1_CONTRAST1 ~ c*CONTRAST1
AXIS2_CONTRAST2 ~ d*CONTRAST2
# covariances
AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
# Calculate pathways for easy interpretation
direct := b
indirect_CONTRAST1 := c*e
"
} %>%
gsub('CONTRAST1', contrast[1], .) %>%
gsub('CONTRAST2', contrast[2], .) %>%
gsub('AXIS1', axis_name[1], .) %>%
gsub('AXIS2', axis_name[2], .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_follow_only = {
"
# Model
AV_DW ~ YEAR + e*AXIS1_CONTRAST1
AXIS1_CONTRAST1 ~ c*CONTRAST1
AXIS2_CONTRAST2 ~ d*CONTRAST2
# covariances
AXIS1_CONTRAST1 + AXIS2_CONTRAST2 ~~ 0*AV_DW
AXIS1_CONTRAST1 ~~ AXIS2_CONTRAST2
# Calculate pathways for easy interpretation
indirect_CONTRAST1 := c*e
"
} %>%
gsub('CONTRAST1', contrast[1], .) %>%
gsub('CONTRAST2', contrast[2], .) %>%
gsub('AXIS1', axis_name[1], .) %>%
gsub('AXIS2', axis_name[2], .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(full_follow_partial_length, partial_follow_full_length, full_mediation_full, partial_mediation_full, no_mediation, full_follow_none_length, full_follow_only) %T>%
print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff
## partial_mediation_full 4 226.72 250.37 15.072
## full_follow_partial_length 5 224.74 246.70 15.091 0.0188 1
## partial_follow_full_length 5 229.24 251.20 19.592 4.5018 0
## full_mediation_full 6 227.49 247.75 19.838 0.2457 1
## no_mediation 6 232.61 252.88 24.961 5.1228 0
## full_follow_none_length 6 224.32 244.59 16.670 -8.2909 0
## full_follow_only 7 225.71 244.29 20.064 3.3946 1
## Pr(>Chisq)
## partial_mediation_full
## full_follow_partial_length 0.89089
## partial_follow_full_length
## full_mediation_full 0.62014
## no_mediation
## full_follow_none_length
## full_follow_only 0.06541 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
if (length(sem_name) > 1) sem_name = paste(crop, sampling, paste(unique(barcode), collapse='_'), 'COMBINED', sep='_')
sem_ls[[sem_name]] = list(full_follow_partial_length = full_follow_partial_length,
partial_follow_full_length = full_follow_partial_length,
full_mediation_full = full_follow_partial_length,
partial_mediation_full = full_follow_partial_length,
no_mediation = full_follow_partial_length,
full_follow_none_length = full_follow_partial_length,
full_follow_only = full_follow_partial_length,
df = mod_df)
The best model is:
lavaanPlot(model=get(keep_mod), coefs=TRUE)
With a poor model fit:
anova(get(keep_mod))
So we will use just the FOLLOW_SOY model.
select_barcode = 'ITS'
#' plot response ~ barcode_axis based on data in mod_df created above.
#' add shapes for partial and colors for constraint
#' response is optionally the residuals of response ~ partial.
#' returns a ggplot object for easy post-hoc formatting
nn = keep_mod
pp = c(partial, 'ROTATION_LENGTH')
plot_effect(select_barcode, response, pp, constraint, contrast[1], mod_df, partial_response=TRUE, nn)
First we see if these communities respond to the treatment.
crop = 'corn'
sampling = 'flowering'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 corn flowering 16S 0.1258768 1.267429 5,32 0.0037
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 12.6 of corn 16S community variation at flowering (p = 0.0037). Therefore, we will test contrasts with the 16S dataset.
barcode = 'ITS'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 corn flowering ITS 0.1386246 1.601597 5,34 0.0018
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 13.9 of corn ITS community variation at flowering (p = 0.0018). Therefore, we will test contrasts with the ITS dataset.
contrast = 'ROTATION_LENGTH'
barcode = '16S'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 corn flowering 16S ROTATION_LENGTH 0.03363979 1.327731 2,35 0.0522
ord_ls[[nn]] = perms
ROTATION_LENGTH explains 3.36 of corn 16S community variation at flowering (p = 0.052). We will investigate taxa.
barcode = 'ITS'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 corn flowering ITS ROTATION_LENGTH 0.03785292 1.674345 2,37 0.0327
ord_ls[[nn]] = perms
ROTATION_LENGTH explains 3.79 of corn ITS community variation at flowering. This is a strong (p = 0.033) effect. We will investigate taxa.
find_ords = c(crop, sampling, contrast) %>%
sapply(grepl, names(ord_ls)) %>%
apply(1, all)
local_ords = ord_ls[find_ords] %>%
lapply(extract2, 'observed')
sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords,
crop,
sampling,
contrasts[[crop]][contrast, , drop=FALSE],
partial,
constraint,
response,
plot,
auto_invert=TRUE) # if contrast and axis are negatively correlated, for consistency.
# Add T1 biomass
seed_bio = subset(plot, SAMPLING=='seedling' & CROP=='corn', select=c('SAMPLE_ID', 'AV_DW'))
colnames(seed_bio)[2] = 'T1_AV_DW'
seed_bio$SAMPLE_ID %<>% as.character %>%
strsplit('_') %>%
sapply(function(x) paste(x[1:3], collapse='_'))
mod_df$SID = mod_df$SAMPLE_ID %>%
as.character %>%
strsplit('_') %>%
sapply(function(x) paste(x[1:3], collapse='_'))
mod_df %<>% merge(seed_bio, by.x='SID', by.y='SAMPLE_ID')
head(mod_df)
The SEM models are:
no_mediation: A null model where AV_DW responds only to ROTATION_LENGTHpartial_mediation: A full model where AV_DW responds to both ROTATION_LENGTH and the communityfull_mediation: An intermediate model where AV_DW responds only to the communityIn all models, YEAR is included as a predictor of AV_DW, as well.
mod_df %<>% na.omit
no_mediation = {
"
# Model
AV_DW ~ YEAR + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ 0*AV_DW
AXIS_16S ~~ 0*AV_DW
AXIS_ITS ~~ 0*AXIS_16S
# Calculate pathways for easy interpretation
direct := c
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_full = {
"
AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
#AXIS_ITS ~~ AXIS_16S
direct := c
total := c + (a*b) + (d*e)
indirect_its := a*b
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_16S = {
"
AV_DW ~ YEAR + e*AXIS_16S + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_ITS ~~ 0*AV_DW
direct := c
total := c + (d*e)
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
AXIS_16S ~~ 0*AV_DW
direct := c
total := c + (a*b)
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_full = {
"
AV_DW ~ YEAR + b*AXIS_ITS + e*AXIS_16S + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
#AXIS_ITS ~~ AXIS_16S
indirect_its := a*b
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_16S = {
"
AV_DW ~ YEAR + e*AXIS_16S + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
#AXIS_ITS ~~ AXIS_16S
AXIS_ITS ~~ 0*AV_DW
indirect_16s := d*e
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
AXIS_16S ~ d*CONTRAST
# covariances
#AXIS_ITS ~~ AXIS_16S
AXIS_16S ~~ 0*AV_DW
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_mediation, partial_mediation_full, partial_mediation_16S, partial_mediation_ITS, full_mediation_full, full_mediation_16S, full_mediation_ITS) %T>%
print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_full 5 219.75 241.04 2.5267
## partial_mediation_16S 6 217.89 237.54 2.6636 0.1369 1 0.71142
## partial_mediation_ITS 6 222.46 242.11 7.2329 4.5693 0
## full_mediation_full 6 217.76 237.42 2.5369 -4.6961 0
## no_mediation 7 220.47 238.48 7.2378 4.7009 1 0.03015 *
## full_mediation_16S 7 215.91 233.92 2.6763 -4.5616 0
## full_mediation_ITS 7 222.50 240.51 9.2672 6.5909 0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation,
partial_mediation_ITS = partial_mediation_ITS,
partial_mediation_16S = partial_mediation_16S,
full_mediation_ITS = full_mediation_ITS,
full_mediation_16S = full_mediation_16S,
full_mediation_full = full_mediation_full,
df = mod_df)
The best model is full_mediation_16S. ITS models are the worst - worse than no_mediation.
mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 33 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 11
##
## Number of observations 38
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 2.676
## Degrees of freedom 7
## P-value (Chi-square) 0.913
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AV_DW ~
## YEAR -0.922 0.316 -2.921 0.003 -0.922 -0.462
## AXIS_16S (e) -0.281 0.087 -3.244 0.001 -0.281 -0.278
## T1_AV_DW 3.881 2.068 1.877 0.061 3.881 0.366
## AXIS_ITS ~
## ROTATION_L (a) 1.618 0.209 7.733 0.000 1.618 0.794
## AXIS_16S ~
## ROTATION_L (d) 1.601 0.335 4.773 0.000 1.601 0.786
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW ~~
## .AXIS_ITS 0.000 0.000 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 0.661 0.822 0.805 0.421 0.661 0.664
## .AXIS_ITS 0.032 0.102 0.313 0.754 0.032 0.032
## .AXIS_16S 0.032 0.109 0.289 0.773 0.032 0.032
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 0.247 0.062 3.988 0.000 0.247 0.249
## .AXIS_ITS 0.359 0.086 4.168 0.000 0.359 0.369
## .AXIS_16S 0.372 0.094 3.968 0.000 0.372 0.382
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## indirect_16s -0.449 0.168 -2.674 0.007 -0.449 -0.218
lavaanPlot(model=get(keep_mod), coefs=TRUE)
select_barcode = '16S'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_effect(select_barcode, response, partial, constraint, contrast, mod_df, partial_response=TRUE, nn)
The taxa that load most credibly onto this axis are:
select_barcode = '16S'
is_ITS = select_barcode=='ITS'
if (is_ITS) {
depth = 'Species'
guilds = funguild
} else {
depth = 'Genus'
guilds = NULL
}
ord_obj = paste(crop, sampling, select_barcode, contrast, sep='_') %>%
ord_ls[[.]]
top_tax = id_top_tax(ord_obj,
phylo_obj[[select_barcode]],
metric='feature_F',
phylo_depth=depth,
max_p_adj=1,
max_p_val=1,
guilds=NULL,
auto_invert=TRUE) %>%
subset(pcAxis1^2 > 0.01)
tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# subset(top_tax[tt_sort, ], pcAxis1^2 > 0.005)
# top_tax[tt_sort, ]
# tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
# top_tax %<>% .[tt_sort, ] %T>% print# %>%
# subset(pcAxis1^2 > 0.005) %T>%
# print
# subset(top_tax[tt_sort, ], p_val < 0.05)
taxa_results %<>% update_taxa_results(crop, sampling, select_barcode, contrast, top_tax[tt_sort, ])
## CROP SAMPLING BARCODE CONTRAST ASV PHYLUM
## 1 corn flowering 16S ROTATION_LENGTH ASV_204 Proteobacteria
## 2 corn flowering 16S ROTATION_LENGTH ASV_222 Actinobacteria
## 3 corn flowering 16S ROTATION_LENGTH ASV_479 Actinobacteria
## 4 corn flowering 16S ROTATION_LENGTH ASV_136 Actinobacteria
## 5 corn flowering 16S ROTATION_LENGTH ASV_460 Verrucomicrobia
## 6 corn flowering 16S ROTATION_LENGTH ASV_307 Actinobacteria
## 7 corn flowering 16S ROTATION_LENGTH ASV_238 Proteobacteria
## 8 corn flowering 16S ROTATION_LENGTH ASV_284 Proteobacteria
## 9 corn flowering 16S ROTATION_LENGTH ASV_236 Proteobacteria
## 10 corn flowering 16S ROTATION_LENGTH ASV_183 Verrucomicrobia
## 11 corn flowering 16S ROTATION_LENGTH ASV_149 Actinobacteria
## 12 corn flowering 16S ROTATION_LENGTH ASV_417 Actinobacteria
## 13 corn flowering 16S ROTATION_LENGTH ASV_103 Actinobacteria
## 14 corn flowering 16S ROTATION_LENGTH ASV_86 Actinobacteria
## 15 corn flowering 16S ROTATION_LENGTH ASV_63 Verrucomicrobia
## 16 corn flowering 16S ROTATION_LENGTH ASV_79 Actinobacteria
## 17 corn flowering 16S ROTATION_LENGTH ASV_122 Actinobacteria
## 18 corn flowering 16S ROTATION_LENGTH ASV_113 Verrucomicrobia
## 19 corn flowering 16S ROTATION_LENGTH ASV_74 Verrucomicrobia
## 20 corn flowering 16S ROTATION_LENGTH ASV_242 Verrucomicrobia
## 21 corn flowering 16S ROTATION_LENGTH ASV_384 Verrucomicrobia
## 22 corn flowering 16S ROTATION_LENGTH ASV_115 Actinobacteria
## TAXA GUILD AXIS
## 1 Burkholderiaceae Burkholderia-Caballeronia-Paraburkholderia NA 0.2692900
## 2 Nocardioidaceae Kribbella NA 0.2146798
## 3 Pseudonocardiaceae Amycolatopsis NA 0.1985530
## 4 Solirubrobacteraceae Conexibacter NA 0.1826547
## 5 Rubritaleaceae Luteolibacter NA 0.1593833
## 6 Intrasporangiaceae Lapillicoccus NA 0.1587951
## 7 Burkholderiaceae Variovorax NA 0.1442220
## 8 Burkholderiaceae Burkholderia-Caballeronia-Paraburkholderia NA 0.1273779
## 9 Burkholderiaceae Variovorax NA 0.1240912
## 10 Chthoniobacteraceae Candidatus_Udaeobacter NA 0.1156731
## 11 Cellulomonadaceae Cellulomonas NA 0.1111435
## 12 Propionibacteriaceae Microlunatus NA 0.1097302
## 13 Gaiellaceae Gaiella NA 0.1052552
## 14 Streptomycetaceae Streptomyces NA 0.1029040
## 15 Chthoniobacteraceae Candidatus_Udaeobacter NA -0.1122029
## 16 Micrococcaceae Pseudarthrobacter NA -0.1194276
## 17 Intrasporangiaceae Lapillicoccus NA -0.1325373
## 18 Chthoniobacteraceae Candidatus_Udaeobacter NA -0.1406555
## 19 Chthoniobacteraceae Candidatus_Udaeobacter NA -0.1476605
## 20 Chthoniobacteraceae Candidatus_Udaeobacter NA -0.1658830
## 21 Chthoniobacteraceae Candidatus_Udaeobacter NA -0.1736827
## 22 Propionibacteriaceae Microlunatus NA -0.1872172
## RSQ F_OBS P_VAL P_ADJ
## 1 0.26980693 13.857270 0.0026 0.1924000
## 2 0.21639662 9.707878 0.0039 0.2405000
## 3 0.39011518 22.641996 0.0004 0.0740000
## 4 0.08309827 3.366535 0.0939 0.8447170
## 5 0.05676092 2.113783 0.1613 0.8447170
## 6 0.40386249 23.715336 0.0003 0.0740000
## 7 0.09121844 3.840779 0.0433 0.8447170
## 8 0.19690625 9.276857 0.0081 0.3746250
## 9 0.12244231 5.086227 0.0300 0.8411333
## 10 0.06752865 2.744123 0.1167 0.8447170
## 11 0.08181198 3.742637 0.0703 0.8447170
## 12 0.05304831 2.381707 0.1106 0.8447170
## 13 0.03039700 1.106445 0.2962 0.8447170
## 14 0.25836391 13.093155 0.0017 0.1572500
## 15 0.01402708 0.498419 0.4695 0.8447170
## 16 0.05473083 2.317797 0.1466 0.8447170
## 17 0.08140144 4.228837 0.0529 0.8447170
## 18 0.10437767 4.200901 0.0500 0.8447170
## 19 0.06641153 2.625184 0.0989 0.8447170
## 20 0.03452340 1.326884 0.2481 0.8447170
## 21 0.04981441 1.837217 0.1809 0.8447170
## 22 0.13573839 5.526551 0.0303 0.8411333
Full model predicting yield based on above results.
Approach:
sem_name = 'corn_combined'
models = c('corn_seedling_FOLLOW_SOY',
'corn_flowering_ROTATION_LENGTH')
plot_sub = c('SAMPLE_ID', 'PLOT', 'YEAR', 'CROP', 'CROPYIELD') %>%
plot[, .] %>%
subset(!is.na(CROPYIELD))
plot_sub$SID = with(plot_sub, paste(CROP, YEAR, PLOT, sep='_'))
seed = sem_ls[[models[1]]]$df %>%
extract(c('SAMPLE_ID', 'AXIS_ITS', 'FOLLOW_SOY'))
colnames(seed)[c(2,3)] %<>% paste('T1', ., sep='_')
seed$SID = seed$SAMPLE_ID %>%
as.character %>%
strsplit('_') %>%
lapply(extract, 1:3) %>%
lapply(paste, collapse='_')
seed$SAMPLE_ID = NULL
flow = sem_ls[[models[2]]]$df %>%
extract(c('SID', 'AXIS_16S', 'ROTATION_LENGTH', 'AV_DW', 'T1_AV_DW'))
colnames(flow)[c(2:4)] %<>% paste('T2', ., sep='_')
mod_df = merge(plot_sub, seed, by='SID') %>%
merge(flow, by='SID')
is_numeric = sapply(mod_df, is.numeric)
mod_df[is_numeric] %<>% scale
CHECKHERE
mod_df %<>% na.omit
no_microbe = {
"
CROPYIELD ~ YEAR + T2_AV_DW + T2_ROTATION_LENGTH
T2_AV_DW ~ YEAR + T2_ROTATION_LENGTH + T1_AV_DW
T1_AV_DW ~ YEAR + T1_FOLLOW_SOY
T2_AXIS_16S ~ T2_ROTATION_LENGTH
T1_AXIS_ITS ~ T1_FOLLOW_SOY
"
} %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_microbe = {
"
CROPYIELD ~ YEAR + a*T2_AV_DW + T2_ROTATION_LENGTH
T2_AV_DW ~ YEAR + b*T1_AV_DW + c*T2_AXIS_16S
T1_AV_DW ~ YEAR + d*T1_AXIS_ITS
T2_AXIS_16S ~ T2_ROTATION_LENGTH
T1_AXIS_ITS ~ T1_FOLLOW_SOY
# microbe paths
t1_microbe := a*b*d
t2_microbe := a*c
"
} %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
biomass_only = {
"
CROPYIELD ~ YEAR + T2_AV_DW
T2_AV_DW ~ YEAR + T1_AV_DW
"
} %>%
sem(mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_microbe, full_microbe) %T>% print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## no_microbe 12 316.39 354.06 34.262
## full_microbe 15 301.56 334.31 25.428 -8.8337 3 1
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_microbe = no_microbe,
full_microbe = full_microbe,
df = mod_df)
full_microbe is the best, by -14.83 AIC units.
lavaanPlot(model=get(keep_mod), coefs=TRUE, stand=TRUE)
summary(full_microbe, std=TRUE)
## lavaan 0.6-5 ended normally after 44 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 20
##
## Number of observations 38
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 25.428
## Degrees of freedom 15
## P-value (Chi-square) 0.044
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CROPYIELD ~
## YEAR 2.385 0.176 13.571 0.000 2.385 1.221
## T2_AV_DW (a) 0.401 0.110 3.631 0.000 0.401 0.404
## T2_ROTATIO -0.154 0.070 -2.214 0.027 -0.154 -0.156
## T2_AV_DW ~
## YEAR -0.922 0.331 -2.783 0.005 -0.922 -0.469
## T1_AV_DW (b) 0.369 0.204 1.814 0.070 0.369 0.372
## T2_AXIS_16 (c) -0.281 0.086 -3.270 0.001 -0.281 -0.282
## T1_AV_DW ~
## YEAR -1.677 0.149 -11.238 0.000 -1.677 -0.845
## T1_AXIS_IT (d) -0.346 0.108 -3.204 0.001 -0.346 -0.344
## T2_AXIS_16S ~
## T2_ROTATIO 0.786 0.168 4.676 0.000 0.786 0.786
## T1_AXIS_ITS ~
## T1_FOLLOW_ 0.690 0.116 5.974 0.000 0.690 0.690
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .CROPYIELD -3.641 0.253 -14.409 0.000 -3.641 -3.733
## .T2_AV_DW 1.408 0.475 2.966 0.003 1.408 1.432
## .T1_AV_DW 2.559 0.275 9.316 0.000 2.559 2.582
## .T2_AXIS_16S 0.000 0.100 0.000 1.000 0.000 0.000
## .T1_AXIS_ITS 0.000 0.117 0.000 1.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .CROPYIELD 0.081 0.017 4.650 0.000 0.081 0.085
## .T2_AV_DW 0.247 0.061 4.064 0.000 0.247 0.256
## .T1_AV_DW 0.177 0.055 3.242 0.001 0.177 0.180
## .T2_AXIS_16S 0.372 0.091 4.102 0.000 0.372 0.382
## .T1_AXIS_ITS 0.510 0.094 5.423 0.000 0.510 0.523
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## t1_microbe -0.051 0.027 -1.926 0.054 -0.051 -0.052
## t2_microbe -0.113 0.051 -2.185 0.029 -0.113 -0.114
Continuing on to soybean…
As with univariate tests, we’ll first test whether our response, the microbial community, responds to the overall treatments. If so, we will test the contrasts that were also predictive of plant health (biomass). The dataset will be reduced to those taxa responding to the contrast. Output will be the same as for corn.
First we see if these communities respond to the treatment.
crop = 'soy'
sampling = 'seedling'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 soy seedling 16S 0.109189 1.137291 5,33 0.1555
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 10.9 of soy 16S community variation at seedling (p = 0.16). Therefore, we will test not contrasts with the 16S dataset.
barcode = 'ITS'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 soy seedling ITS 0.2026763 2.315203 5,33 1e-04
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 20.3 of soy ITS community variation at seedling (p = 10^{-4}). Therefore, we will test contrasts with the ITS dataset.
For compatibility with downstream code
contrast = 'FOLLOW_CORN'
barcode = '16S'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
The contrast has a negative score, so we will invert it.
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 soy seedling 16S FOLLOW_CORN 0.0269337 1.108991 2,36 0.3428
ord_ls[[nn]] = perms
16S communities did not respond to ROTATION at seedling (p = 0.34), as expected based on full rotation/treatment results. We will not test it further or reduce features. We’ll just save the scores for plotting.
contrast = 'FOLLOW_CORN'
barcode = 'ITS'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
The contrast has a negative score, so we will invert it.
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 soy seedling ITS FOLLOW_CORN 0.07422715 3.141276 2,36 2e-04
ord_ls[[nn]] = perms
ITS communities did respond to ROTATION at seedling (p = 210^{-4}). We will see if these communities predict soy biomass.
Here are the axis scores so far. Note we will NOT include 16S in the model.
find_ords = c(crop, sampling, contrast) %>%
sapply(grepl, names(ord_ls)) %>%
apply(1, all)
local_ords = ord_ls[find_ords] %>%
lapply(extract2, 'observed')
sem_name = paste(crop, sampling, contrast, sep='_')
#' combine axis (standard) scores plus response, contrast, and partial data into a single data.frame for modeling.
#' contrast_row is the row of contrasts[[crop]]. Must be a matrix.
mod_df = generate_prediction_df(local_ords,
crop,
sampling,
contrasts[[crop]][contrast,,drop=FALSE],
partial,
constraint,
response,
plot,
auto_invert=TRUE) # if contrast and axis are negatively correlated, for consistency.
head(mod_df)
Models are as previously described.
mod_df %<>% na.omit
no_mediation = {
"
# Model
AV_DW ~ YEAR + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
AXIS_ITS ~~ 0*AV_DW
# Calculate pathways for easy interpretation
direct := c
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
direct := c
total := c + (a*b)
indirect := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS# + c*CONTRAST
AXIS_ITS ~ a*CONTRAST
# direct := c
# total := c + (a*b)
indirect := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_ITS 1 166.61 179.92 0.2246
## no_mediation 2 168.07 179.71 3.6808 3.4562 1 0.06302 .
## full_mediation_ITS 2 164.99 176.63 0.6025 -3.0783 0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation,
partial_mediation_ITS = partial_mediation_ITS,
full_mediation_ITS = full_mediation_ITS,
df = mod_df)
The best model is full_mediation_ITS.
mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 7
##
## Number of observations 39
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 0.602
## Degrees of freedom 2
## P-value (Chi-square) 0.740
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AV_DW ~
## YEAR -1.165 0.216 -5.392 0.000 -1.165 -0.597
## AXIS_ITS (b) -0.423 0.117 -3.621 0.000 -0.423 -0.428
## AXIS_ITS ~
## FOLLOW_COR (a) 1.750 0.228 7.693 0.000 1.750 0.790
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 1.763 0.365 4.832 0.000 1.763 1.807
## .AXIS_ITS -0.022 0.100 -0.225 0.822 -0.022 -0.023
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 0.450 0.081 5.548 0.000 0.450 0.472
## .AXIS_ITS 0.366 0.098 3.726 0.000 0.366 0.376
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## indirect -0.740 0.232 -3.185 0.001 -0.740 -0.338
This is an excellent model.
lavaanPlot(model=get(keep_mod), coefs=TRUE)
select_barcode = 'ITS'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_effect(select_barcode, response, partial, constraint, contrast, mod_df, partial_response=TRUE, nn)
The taxa that load most credibly onto this axis are:
select_barcode = 'ITS'
is_ITS = select_barcode=='ITS'
if (is_ITS) {
depth = 'Species'
guilds = funguild
} else {
depth = 'Genus'
guilds = NULL
}
ord_obj = paste(crop, sampling, select_barcode, contrast, sep='_') %>%
ord_ls[[.]]
top_tax = id_top_tax(ord_obj,
phylo_obj[[select_barcode]],
metric='feature_F',
phylo_depth=depth,
max_p_adj=1,
max_p_val=1,
guilds=funguild,
auto_invert=TRUE) %>%
subset(pcAxis1^2 > 0.01)
tt_sort = order(top_tax$pcAxis1, decreasing=TRUE)
taxa_results %<>% update_taxa_results(crop, sampling, select_barcode, contrast, top_tax[tt_sort, ])
## CROP SAMPLING BARCODE CONTRAST ASV PHYLUM
## 1 soy seedling ITS FOLLOW_CORN ASV_32 Ascomycota
## 2 soy seedling ITS FOLLOW_CORN ASV_25 Ascomycota
## 3 soy seedling ITS FOLLOW_CORN ASV_64 Basidiomycota
## 4 soy seedling ITS FOLLOW_CORN ASV_14 Ascomycota
## 5 soy seedling ITS FOLLOW_CORN ASV_23 Ascomycota
## 6 soy seedling ITS FOLLOW_CORN ASV_65 Ascomycota
## 7 soy seedling ITS FOLLOW_CORN ASV_259 Ascomycota
## 8 soy seedling ITS FOLLOW_CORN ASV_58 Ascomycota
## 9 soy seedling ITS FOLLOW_CORN ASV_71 Ascomycota
## 10 soy seedling ITS FOLLOW_CORN ASV_29 Ascomycota
## 11 soy seedling ITS FOLLOW_CORN ASV_88 Ascomycota
## 12 soy seedling ITS FOLLOW_CORN ASV_52 Ascomycota
## 13 soy seedling ITS FOLLOW_CORN ASV_279 Ascomycota
## 14 soy seedling ITS FOLLOW_CORN ASV_278 Ascomycota
## 15 soy seedling ITS FOLLOW_CORN ASV_46 Ascomycota
## TAXA
## 1 Unknown Cladosporium
## 2 Dactylonectria macrodidyma
## 3 Ustilago maydis
## 4 Unknown Alternaria
## 5 Unknown Didymellaceae
## 6 Unknown Branch06
## 7 Unknown Neosetophoma
## 8 Penicillium decumbens
## 9 Unknown Paraphaeosphaeria
## 10 Mycosphaerella tassiana
## 11 Unknown Branch06
## 12 Corynespora cassiicola
## 13 Thelonectria rubrococca
## 14 Piniphoma wesendahlina
## 15 Unknown Branch06
## GUILD
## 1 Animal Pathogen-Endophyte-Lichen Parasite-Plant Pathogen-Wood Saprotroph
## 2 -
## 3 Plant Pathogen
## 4 Animal Pathogen-Endophyte-Plant Pathogen-Wood Saprotroph
## 5 -
## 6 -
## 7 Undefined Saprotroph
## 8 Dung Saprotroph-Undefined Saprotroph-Wood Saprotroph
## 9 Undefined Saprotroph
## 10 Plant Pathogen
## 11 -
## 12 Undefined Saprotroph
## 13 Leaf Saprotroph
## 14 -
## 15 -
## AXIS RSQ F_OBS P_VAL P_ADJ
## 1 0.3926261 0.22895505 12.4680595 0.0017 0.1472200
## 2 0.2270544 0.08400267 3.4219543 0.0608 0.4323127
## 3 0.1973184 0.14163798 6.5200039 0.0181 0.4299791
## 4 0.1777951 0.09546936 4.0386463 0.0523 0.4323127
## 5 0.1605666 0.09644298 3.9113190 0.0584 0.4323127
## 6 0.1587745 0.09614298 5.3484036 0.0300 0.4299791
## 7 0.1516496 0.20874898 10.1400855 0.0043 0.2068778
## 8 -0.1061957 0.02378775 0.8844087 0.3515 0.5896345
## 9 -0.1148074 0.15980403 6.9701302 0.0081 0.3188455
## 10 -0.1362238 0.27602223 14.2143370 0.0004 0.0433000
## 11 -0.1567958 0.08153767 3.2051756 0.0927 0.5078835
## 12 -0.1869930 0.06029506 2.8889799 0.0629 0.4323127
## 13 -0.2435872 0.06363983 2.7724728 0.0961 0.5078835
## 14 -0.2801628 0.36458901 20.7461134 0.0002 0.0433000
## 15 -0.4811718 0.23010605 10.7615763 0.0024 0.1670143
Quite the mix of taxa here.
crop = 'soy'
sampling = 'flowering'
constraint = 'ROTATION'
partial = 'YEAR'
barcode = '16S'
contrast = 'None'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 soy flowering 16S 0.1002728 1.118494 5,34 0.2338
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 10 of soy 16S community variation at flowering (p = 0.23). Therefore, we will not test contrasts with the 16S dataset.
barcode = 'ITS'
nn = paste(crop, sampling, barcode, contrast, sep='_')
plot_name = paste(crop, sampling, barcode, '\npartial, constrained')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name)
perms = permute_ord(ord, metric='model_F', times=nperms, permute_on='partial')
full_ord_results %<>% update_full_ord_results(crop, sampling, barcode, perms)
## CROP SAMPLING BARCODE RSQ F_OBS Df P_VAL
## 1 soy flowering ITS 0.1546665 1.623074 5,33 0.0119
ord_ls[[nn]] = perms # structure includes: permutation results, observed data (pcwOrd object).
ROTATION explains 15.5 of soy ITS community variation at flowering (p = 0.012). Therefore, we will test contrasts with the ITS dataset.
For compatibility with downstream code
contrast = 'FOLLOW_CORN'
barcode = '16S'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 soy flowering 16S FOLLOW_CORN 0.01984165 0.8714307 2,37 0.794
ord_ls[[nn]] = perms
16S communities did not respond to ROTATION at flowering (p = 0.79), as expected based on full rotation/treatment results. We will not test it further or reduce features. We’ll just save the scores for plotting.
barcode = 'ITS'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
The contrast has a negative score, so we will invert it.
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 soy flowering ITS FOLLOW_CORN 0.0698402 2.886661 2,36 0.0043
ord_ls[[nn]] = perms
ITS communities did respond to ROTATION at flowering (p = 0.0043). We will see if these communities predict soy biomass.
Here are the axis scores so far. Note we will NOT include 16S in the model.
find_ords = c(crop, sampling, contrast) %>%
sapply(grepl, names(ord_ls)) %>%
apply(1, all)
local_ords = ord_ls[find_ords] %>%
lapply(extract2, 'observed')
sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords,
crop,
sampling,
contrasts[[crop]][contrast,,drop=FALSE],
partial,
constraint,
response,
plot,
auto_invert=TRUE) # if contrast and axis are negatively correlated, for consistency.
# Add T1 biomass
seed_bio = subset(plot, SAMPLING=='seedling' & CROP==crop, select=c('SAMPLE_ID', 'AV_DW'))
colnames(seed_bio)[2] = 'T1_AV_DW'
seed_bio$SAMPLE_ID %<>% as.character %>%
strsplit('_') %>%
sapply(function(x) paste(x[1:3], collapse='_'))
mod_df$SID = mod_df$SAMPLE_ID %>%
as.character %>%
strsplit('_') %>%
sapply(function(x) paste(x[1:3], collapse='_'))
mod_df %<>% merge(seed_bio, by.x='SID', by.y='SAMPLE_ID')
head(mod_df)
The SEM models are as described above.
mod_df %<>% na.omit
no_mediation = {
"
# Model
AV_DW ~ YEAR + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
# covariances
AXIS_ITS ~~ 0*AV_DW
# Calculate pathways for easy interpretation
direct := c
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
direct := c
total := c + (a*b)
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
# covariances
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_ITS 2 173.55 188.29 7.1056
## no_mediation 3 171.60 184.70 7.1573 0.0517 1 0.8201
## full_mediation_ITS 3 183.16 196.26 18.7119 11.5546 0
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation,
partial_mediation_ITS = partial_mediation_ITS,
full_mediation_ITS = full_mediation_ITS,
df = mod_df)
The best model is no_mediation.
mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 8
##
## Number of observations 38
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 7.157
## Degrees of freedom 3
## P-value (Chi-square) 0.067
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AV_DW ~
## YEAR 1.314 0.287 4.580 0.000 1.314 0.666
## FOLLOW_COR (c) -1.023 0.248 -4.133 0.000 -1.023 -0.468
## T1_AV_DW 2.993 1.461 2.048 0.041 2.993 0.288
## AXIS_ITS ~
## FOLLOW_COR (a) 1.412 0.286 4.933 0.000 1.412 0.646
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW ~~
## .AXIS_ITS 0.000 0.000 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW -3.222 0.974 -3.309 0.001 -3.222 -3.265
## .AXIS_ITS -0.019 0.123 -0.151 0.880 -0.019 -0.019
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 0.363 0.101 3.593 0.000 0.363 0.373
## .AXIS_ITS 0.567 0.106 5.376 0.000 0.567 0.582
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## direct -1.023 0.248 -4.131 0.000 -1.023 -0.468
This is an poor model.
lavaanPlot(model=get(keep_mod), coefs=TRUE)
Fungi did not mediate the effect of FOLLOW_CORN on soy biomass at flowering.
For compatibility with downstream code
contrast = 'ROTATION_LENGTH'
barcode = '16S'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
The contrast has a negative score, so we will invert it.
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 soy flowering 16S ROTATION_LENGTH 0.02291875 1.010265 2,37 0.529
ord_ls[[nn]] = perms
16S communities did not respond to ROTATION at flowering (p = 0.53), as expected based on full rotation/treatment results. We will not test it further or reduce features. We’ll just save the scores for plotting.
barcode = 'ITS'
contr = contrasts[[crop]][contrast, , drop=FALSE]
plot_name = paste(crop, sampling, barcode, contrast, '\npartial, constrained')
nn = paste(crop, sampling, barcode, contrast, sep='_')
ord = ordinate(phylo_obj[[barcode]], crop, sampling, constraint, partial, plot_name, contr)
perms = permute_ord(ord, metric=c('model_F', 'feature_F', 'feature_scores'), axes=1, times=nperms)
The contrast has a negative score, so we will invert it.
contrast_ord_results %<>% update_contrast_ord_results(crop, sampling, barcode, contrast, perms)
## CROP SAMPLING BARCODE CONTRAST RSQ F_OBS Df P_VAL
## 1 soy flowering ITS ROTATION_LENGTH 0.03081713 1.219124 2,36 0.2191
ord_ls[[nn]] = perms
ITS communities did not respond to ROTATION at flowering (p = 0.22). We will see if these communities predict soy biomass.
Here are the axis scores so far. Note we will NOT include 16S in the model.
find_ords = c(crop, sampling, contrast) %>%
sapply(grepl, names(ord_ls)) %>%
apply(1, all)
local_ords = ord_ls[find_ords] %>%
lapply(extract2, 'observed')
sem_name = paste(crop, sampling, contrast, sep='_')
mod_df = generate_prediction_df(local_ords,
crop,
sampling,
contrasts[[crop]][contrast,,drop=FALSE],
partial,
constraint,
response,
plot,
auto_invert=TRUE) # if contrast and axis are negatively correlated, for consistency.
# Add T1 biomass
seed_bio = subset(plot, SAMPLING=='seedling' & CROP==crop, select=c('SAMPLE_ID', 'AV_DW'))
colnames(seed_bio)[2] = 'T1_AV_DW'
seed_bio$SAMPLE_ID %<>% as.character %>%
strsplit('_') %>%
sapply(function(x) paste(x[1:3], collapse='_'))
mod_df$SID = mod_df$SAMPLE_ID %>%
as.character %>%
strsplit('_') %>%
sapply(function(x) paste(x[1:3], collapse='_'))
mod_df %<>% merge(seed_bio, by.x='SID', by.y='SAMPLE_ID')
head(mod_df)
The SEM models are as described above.
mod_df %<>% na.omit
no_mediation = {
"
# Model
AV_DW ~ YEAR + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
# covariances
AXIS_ITS ~~ 0*AV_DW
# Calculate pathways for easy interpretation
direct := c
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
partial_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + c*CONTRAST + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
direct := c
total := c + (a*b)
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_mediation_ITS = {
"
AV_DW ~ YEAR + b*AXIS_ITS + T1_AV_DW
AXIS_ITS ~ a*CONTRAST
# covariances
indirect_its := a*b
"
} %>%
gsub('CONTRAST', contrast, .) %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_mediation, partial_mediation_ITS, full_mediation_ITS) %T>%
print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## partial_mediation_ITS 2 181.79 196.53 4.6383
## no_mediation 3 180.31 193.41 5.1596 0.52135 1 0.4703
## full_mediation_ITS 3 181.43 194.53 6.2767 1.11705 0
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_mediation = no_mediation,
partial_mediation_ITS = partial_mediation_ITS,
full_mediation_ITS = full_mediation_ITS,
df = mod_df)
The best model is no_mediation.
mod_summ = summary(get(keep_mod), std=TRUE)
## lavaan 0.6-5 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 8
##
## Number of observations 38
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 5.160
## Degrees of freedom 3
## P-value (Chi-square) 0.160
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## AV_DW ~
## YEAR 1.573 0.349 4.510 0.000 1.573 0.797
## ROTATION_L (c) -0.531 0.167 -3.183 0.001 -0.531 -0.261
## T1_AV_DW 5.305 1.467 3.617 0.000 5.305 0.510
## AXIS_ITS ~
## ROTATION_L (a) 1.332 0.212 6.281 0.000 1.332 0.654
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW ~~
## .AXIS_ITS 0.000 0.000 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW -4.611 1.066 -4.326 0.000 -4.611 -4.672
## .AXIS_ITS 0.026 0.121 0.217 0.828 0.026 0.027
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .AV_DW 0.464 0.132 3.528 0.000 0.464 0.477
## .AXIS_ITS 0.557 0.161 3.459 0.001 0.557 0.572
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## direct -0.531 0.167 -3.181 0.001 -0.531 -0.261
This is a good model.
lavaanPlot(model=get(keep_mod), coefs=TRUE)
Fungi did not mediate the effect of ROTATION_LENGTH on soy biomass at flowering.
Full model predicting yield based on above results.
Approach:
sem_name = 'soy_combined'
models = c('soy_seedling_FOLLOW_CORN',
'soy_flowering_ROTATION_LENGTH')
plot_sub = c('SAMPLE_ID', 'PLOT', 'YEAR', 'CROP', 'CROPYIELD') %>%
plot[, .] %>%
subset(!is.na(CROPYIELD))
plot_sub$SID = with(plot_sub, paste(CROP, YEAR, PLOT, sep='_'))
seed = sem_ls[[models[1]]]$df %>%
extract(c('SAMPLE_ID', 'AXIS_ITS', 'FOLLOW_CORN'))
colnames(seed)[c(2,3)] %<>% paste('T1', ., sep='_')
seed$SID = seed$SAMPLE_ID %>%
as.character %>%
strsplit('_') %>%
lapply(extract, 1:3) %>%
lapply(paste, collapse='_')
seed$SAMPLE_ID = NULL
flow = sem_ls[[models[2]]]$df %>%
extract(c('SID', 'ROTATION_LENGTH', 'AV_DW', 'T1_AV_DW'))
colnames(flow)[c(2, 3)] %<>% paste('T2', ., sep='_')
mod_df = merge(plot_sub, seed, by='SID') %>%
merge(flow, by='SID')
mod_df %<>% na.omit
no_microbe = {
"
CROPYIELD ~ YEAR + T2_AV_DW# + T2_ROTATION_LENGTH + T1_FOLLOW_CORN
T2_AV_DW ~ YEAR + T2_ROTATION_LENGTH + T1_AV_DW + T1_FOLLOW_CORN
T1_AV_DW ~ YEAR + T1_FOLLOW_CORN
T1_AXIS_ITS ~ T1_FOLLOW_CORN
"
} %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
full_microbe = {
"
CROPYIELD ~ YEAR + a*T2_AV_DW# + T2_ROTATION_LENGTH + T1_FOLLOW_CORN
T2_AV_DW ~ YEAR + T2_ROTATION_LENGTH + b*T1_AV_DW + T1_FOLLOW_CORN
T1_AV_DW ~ YEAR + d*T1_AXIS_ITS
T1_AXIS_ITS ~ T1_FOLLOW_CORN
# microbe paths
t1_microbe := a*b*d
"
} %>%
sem(data=mod_df, se=sem_error, std.ov=TRUE, missing='ML')
model_compares = anova(no_microbe, full_microbe) %T>% print
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## no_microbe 8 270.82 300.30 12.2709
## full_microbe 9 264.94 292.78 8.3893 -3.8815 1 1
keep_mod = which.min(model_compares$AIC) %>%
rownames(model_compares)[.]
sem_ls[[sem_name]] = list(no_microbe = no_microbe,
full_microbe = full_microbe,
df = mod_df)
full_microbe is the best, by -5.882 AIC units.
lavaanPlot(model=get(keep_mod), coefs=TRUE, stand=TRUE)
summary(full_microbe, std=TRUE)
## lavaan 0.6-5 ended normally after 33 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 17
##
## Number of observations 38
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 8.389
## Degrees of freedom 9
## P-value (Chi-square) 0.495
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 1000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## CROPYIELD ~
## YEAR 1.230 0.129 9.546 0.000 1.230 0.624
## T2_AV_DW (a) 0.451 0.082 5.526 0.000 0.451 0.446
## T2_AV_DW ~
## YEAR 1.163 0.257 4.520 0.000 1.163 0.595
## T2_ROTATIO -0.634 0.155 -4.080 0.000 -0.634 -0.315
## T1_AV_DW (b) 0.198 0.128 1.546 0.122 0.198 0.199
## T1_FOLLOW_ -1.108 0.239 -4.634 0.000 -1.108 -0.512
## T1_AV_DW ~
## YEAR -1.193 0.210 -5.671 0.000 -1.193 -0.609
## T1_AXIS_IT (d) -0.435 0.114 -3.827 0.000 -0.435 -0.439
## T1_AXIS_ITS ~
## T1_FOLLOW_ 1.738 0.224 7.759 0.000 1.738 0.795
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .CROPYIELD -1.845 0.174 -10.596 0.000 -1.845 -1.872
## .T2_AV_DW -1.742 0.385 -4.524 0.000 -1.742 -1.784
## .T1_AV_DW 1.790 0.358 5.006 0.000 1.790 1.828
## .T1_AXIS_ITS -0.023 0.100 -0.230 0.818 -0.023 -0.023
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .CROPYIELD 0.122 0.034 3.599 0.000 0.122 0.126
## .T2_AV_DW 0.273 0.081 3.365 0.001 0.273 0.286
## .T1_AV_DW 0.430 0.082 5.235 0.000 0.430 0.449
## .T1_AXIS_ITS 0.358 0.101 3.551 0.000 0.358 0.368
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## t1_microbe -0.039 0.028 -1.368 0.171 -0.039 -0.039
In both corn and soybean, we see the following pattern:
At the seedling stage, the effect of the previous crop is strong. Soybean growing after corn, and corn growing after soybean, has lowerbiomass than if growing after alternate crops. The fungal community fully explains the previous crop effect. The fungal taxa that are encouraged by corn/soybean as the previous crop tend to be pathogens, suggesting they are suppressing the current crop’s growth.
At the flowering stage, the effect of rotation length dominates, with crops growing in long rotations having higher biomass. The microbial community again fully explains the rotation length effect, at least for corn - bacteria is slightly better than fungi in the case of corn. Soybean show mild evidence of the fungal community partially mediation rotation length (slightly lower χ2), but this is not as parsimonious as the no-mediation model.
To save: 1. Ordination data (with permutations) 2. SEM models and data.frames 3. list of predictive taxa
# print('update')
out_obj = 'Microbes as Mechanism Results.Rdata'
save(full_ord_results, contrast_ord_results, taxa_results, ord_ls, sem_ls,
file=here('Data', out_obj))